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Introduction 

a)  Nature  of  the  problem  (from  original  text) 

A  full-field  digital  mammography  system  has  been  developed  by  Fischer  Medical  systems 
in  collaboration  with  the  University  of  Toronto.  This  scanning  slot  digital  mammography 
system  provides  50um,  12-bit  pixels  with  inherently  better  contrast  than  that  of 
conventional  mammogram.  The  advent  of  digitally  acquired  mammograms  offers  the 
possibility  of  further  improvements  in  early  breast  cancer  detection.  Specifically,  digital 
acquisition  systems  decouple  the  process  of  x-ray  photon  detection  from  image  display  by 
using  a  primary  detector  that  directly  quantifies  transmitted  photons.  This  allows  digital 
systems  to  be  more  efficient  in  utilization  of  radiation  dose.  Digital  systems  also  allow  a 
wide  dynamic  range  so  that  a  wider  range  of  tissue  contrast  can  be  appreciated.  Subtle 
contrast  differences  can  be  amplified  and  the  distinction  between  benign  and  malignant 
might  be  increased.  The  new  scanning  slot  digital  mammography  system  has  the  further 
advantage  of  reduced  scatter  compared  with  both  conventional  and  phosphor  plate 
technologies.  Furthermore,  digital  systems  have  the  capacity  to  bring  revolutionary 
advantages  to  breast  cancer  detection  and  management:  1)  image  processing  for  increased 
lesion  conspicuity;  2)  computer-aided  diagnosis  for  enhanced  radiologic  interpretation;  3) 
teleradiology,  or  image  transmission,  as  a  means  of  bringing  world-class  expertise  to 
community  hospitals  and  remote  areas;  4)  improved  image  access  and  communication 
through  digital  image  archiving  and  transmission;  and  5)  dynamic,  or  "real  time"  imaging 
for  use  during  biopsy  and  localization  procedures. 

However,  there  are  limitations  to  both  laser-printed  film  and  electronic  displays,  the  two 
possible  display  methods  for  digital  mammography  .  The  best  quality  film  printers  can  only 
display  87um  pixels  in  an  8"X10"  printing  of  the  digital  data.  This  would  not  provide 
sufficient  spatial  bandwidth  for  the  available  data.  These  printers  may  also  lack  sufficient 
greyscale  bandwidth.  The  best  possible  2560x2048  pixel  monitors  can  generate  over  170- 
680  nits  luminance  without  pixel  bloom.  To  gain  access  to  the  full  grey  scale  bandwidth, 
monitor  display  would  require  intensity  windowing,  and  to  view  the  image  at  the  full  50 
mm  spatial  resolution,  roaming  and  zooming  would  be  necessary.  Clearly,  any  display 
modality  requires  compromises  that  will  effect  diagnostic  accuracy  and  interpretation  speed. 

b)  Background  of  previous  work  (from  original  proposal) 


For  a  number  of  years,  the  Medical  Image  Presentation  research  group  at  UNC-CH  has 
been  exploring  various  issues  concerning  the  display  of  medical  images.  Early  on  we 
addressed  the  issues  of  standardization  of  display  devices  to  assure  legitimate  comparison 
of  various  display  methods  under  investigation.  The  display  is  perceptually  linearized  so 
that  each  intensity  step  in  the  acquired  image  is  displayed  as  an  equally  perceptible  step  in 
the  grey  levels  of  the  display  [  Pizer  1981, 1987, 1989,  Johnston  1985,  Rogers  1987].  In 
addition,  our  group,  under  another  grant,  (ROl  CA44060)  has  developed  and 
experimentally  evaluated  the  ergonomic  and  cognitive  aspects  of  electronic  workstations. 
We  constructed  a  prototype  workstation  called  FilmStrip  using  a  single  2048x2560  pixel 
high-brightness  monitor,  a  very  simple  interaction,  and  an  extremely  fast  image  display 
time  (0.1  sec).  A  controlled  subject  experiment  was  used  to  evaluate  FilmStrip  relative  to 
film  and  alternator  [Beard  1993].  All  reports  were  of  clinically  acceptable  accuracy.  Based 
on  our  experimental  results,  we  are  95%  confident  that  FilmStrip  is  no  more  than  1.5 
minutes  faster  and  no  more  than  30  seconds  slower  than  film.  This  is  the  first  time  a 
radiology  workstation  has  been  shown  to  be  as  fast  as  film  for  interpretation  of  medical 
images  under  clinically  realistic  conditions.  We  have  conducted  a  subsequent  experiment 
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showing  that  a  lower  cost  version  of  FilmStrip  called  FilmStriplet  can  also  be  clinically 
viable  with  sufficient  training  [Beard  1993]. 

Under  a  medical  image  presentation  program  project  grant,  (P01-CA47982),  we  have  been 
exploring  different  image  processing  methods,  specifically  various  versions  of  the  Contrast 
Limited  Adaptive  Histogram  Equalization  algorithm,  and  have  developed  an  experimental 
method  to  optimize  the  parameters  for  a  given  enhancement  algorithm  that  takes  into 
account  the  deleterious  effects  of  image  noise  and  that  does  not  require  the  performance  of  a 
full  clinical  trial  [Puff,  1992].  This  work  has  involved  the  conduct  of  a  number  of  image 
quality  assessment  experiments. 

Under  the  previously  described  interactive  Digital  Mammography  Development  Group 
grant,  Gray  Scale  Image  Processing  For  Digital  Mammography,  (R01  CA  60193),  we  are 
conducting  preliminary  experiments  to  determine  the  effect  of  the  variable  amount  of 
radiographically  dense  breast  tissue,  the  mammographic  characteristics  of  various  lesion 
types,  and  the  location  of  lesions  within  the  breast  on  the  choice  of  appropriate  intensity 
windows  and  other  image  processing  algorithms  selected  for  electronic  viewing  of 
mammograms.  The  results  of  this  investigation  will  also  give  us  some  indication  of  the 
number  of  intensity  windows  that  might  be  useful,  or  needed,  for  display  of  the  recorded 
digital  information. 

c)  Purpose  of  present  work 

The  purpose  of  this  study  is  to  determine  experimentally  the  diagnostic  accuracy  and 
interpretation  speed  of  the  available  display  methods. 

d)  Methods  of  approach 

We  propose  to  conduct  an  ROC  study  involving  the  best  available  display  methods,  one 
representative  of  a  film  based  display,  and  one  using  the  best  available  state-of-the-art 
electronic  workstation. 

Body 

a)  Accomplishments  to  date 

1 .  To  achieve  the  goals  of  this  research,  we  propose  using  digitally  acquired 
mammograms.  Availablity  of  the  clinical  digital  units  have  been  continuously  delayed 
because  of  detector  upgrades  and  manufacturing  problems.  However,  a  unit  was  installed 
at  UNC  Hospitals  in  April  of  1997.  As  of  this  date  we  have  acquired  76  patient  studies 
with  an  expected  average  of  5  patients  per  day.  A  number  of  technical  issues  have  yet  to  be 
solved  before  any  observer  experiements  can  be  carried  out.  We  have  also  acquired  a 
Kodak  laser  printer  that  can  print  at  50  um/pixel  on  an  8  x  10  format. 

2.  During  the  first  two  years  of  this  grant,  a  number  of  changes  in  the  state-of-the-art  of 
monitor  technology  have  occured,  a)  High  brightness/resolution  monitors,  although 
commercially  available,  have  not  been  as  readily  available  as  once  promised.  There  are 
manufacturing  problems  in  quality  assurance  and  meeting  performance  specifications.  We 
have  evaluated  a  number  of  different  brands  in  our  laboratory  and  with  collaboration  of  Dr. 
Hans  Rhoerig  at  Univ.  of  Arizona  and  Dr.  Harwig  Blume  at  Philips  Medical.  As  a  result  of 
these  extensive  evaluations,  we  have  purchased  two  DataRay  and  two  Orwin  monitors.  To 
achieve  the  maximum  display  able  grey  -levels,  we  have  installed  the  electronics  from  Dome 
(10  bits  grey  level)  and  are  presently  fine-tuning  the  user  interface  and  installing  software 
for  the  soft  copy  workstation. 

The  actual  ROC  observer  studies  will  not  begin  until  sometime  in  the  4  th  year. 

t 
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2.  We  have  completed  observer  studies  with  preset  intensity  windows  selected  for  masses 
and  calcifications.  Our  results  showed  statistically  significant  improvement  in  detection  of 
both  features  with  specified  values  for  the  intensity  windows  [Pisano  1997].  We  have 
developed  improved  intensity  windowing  methods  that  automatically  determine  the 
appropriate  intensity  windowing  range  individually  for  each  mammogram.  This  method  is 
presently  undergoing  observer  studies  for  laboratory  evaluation.  Upon  completion  of  these 
studies,  we  will  be  ready  to  incorporate  automatic  window  selection  into  the  system  for 
image  display  on  the  workstation  and  to  film.  This  research  is  also  partially  supported  by 
NIH  R01-CA60193. 

3.  We  completed  an  experiment  to  determine  the  effect  of  display  luminance  range  on  the 
detection  of  mammographic  features.  This  observer  study  uses  film  displayed  with 
maximum  luminance  at  10, 20, 30, 200,  and  600  FL  to  simulate  the  luminance  range  of 
typical  and  high  brightness  monitors  compared  to  the  lightbox.  The  results  show  no 
statistical  significant  difference  in  detection  of  masses  by  the  observers  [Hemminger  1997[. 

4.  We  are  pursuing  research  in  the  area  of  display  function  standardization,  to  allow  as 
similar  as  possible  presentations  on  the  softcopy  images  as  on  the  film  images.  To  this 
end  we  are  working  with  Fischer  Medical  to  standardize  the  display  to  match  the  Kodak 
printer.  Mr.  Hemminger  is  helping  author  the  DICOM  (ACR/NEMA)  display  function 
standard. 

b)  Research  to  be  accomplished 

1 .  Implement  the  UNC  electronic  mammography  workstation  for  display  of  the  didgital 
mammograms. 

2.  To  redesign  the  experimental  protocol  for  improved  and  more  efficient  data  collection 
and  analysis  for  the  comparison  of  soft  copy  vs  hard  copy  digital  mammograms  and 
comparison  of  digital  to  conventional  screen  film  mammograms. 

3.  Collect  patient  full-field  digital  mammograms. 

4  As  a  result  of  the  delay  in  delivery  of  a  Digital  Mammographic  acquisition  system,  the 
slow  development  of  the  state-of-the-art  high  brightness  monitors,  and  lack  of  access  to 
clinical  digital  mammograms,  we  have  been  behind  in  being  able  to  start  the  observer 
studies.  We  expect,  that  because  of  the  delay  of  availablity  of  digital  images,  to  request  an 
extension  into  an  05  year  with  no  new  funding.  During  the  05  year  extension  we  propose 
completing  the  observer  studies,  data  analysis  and  reporting  the  results. 


Conclusions 

Although  the  acquisition  of  digital  mammograms  has  been  delayed  due  to  delivery 
problems  of  the  new  Digital  Mammogram  systems,  we  have  made  significant  progress  in 
the  following  areas: 

1 .  Helping  identify  display  issues  with  the  digital  mammographic  system. 

2.  Demonstrated  that  maximum  luminance  level  of  the  display  system  may  not  be  a  major 
factor. 

3.  Showed  that  intensity  windowing  is  an  effective  image  enhancement  method. 

4.  Developed  an  automatic  intensity  windowing  method. 

5.  Developed  the  software  tools  for  an  effective  electronic  mammographic  workstation. 
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Abstract 


Purpose: 

To  determine  whether  intensity  windowing  (IW)  improves  detection  of  simulated 
masses  in  dense  mammograms. 

Materials  and  Methods: 

Simulated  masses  were  embedded  in  dense  mammograms  digitized  at  50 
microns/  pixels,  12  bits  deep.  Images  were  printed  with  no  windowing  applied  and  with 
nine  window  width  and  level  combinations  applied.  A  simulated  mass  was  embedded 
in  a  realistic  background  of  dense  breast  tissue,  with  the  position  of  the  mass  (against 
the  background)  varied.  The  key  variables  involved  in  each  trial  included  the  position  of 
the  mass,  the  contrast  levels  and  the  IW  setting  applied  to  the  image.  Combining  the 
10  image  processing  conditions,  4  contrast  levels  and  4  quadrant  positions  gave  160 
combinations.  The  trials  were  constructed  by  pairing  160  combinations  of  key  variables 
with  160  backgrounds.  The  entire  experiment  consisted  of  800  trials.  Twenty 
observers  were  asked  to  detect  the  quadrant  of  the  image  into  which  the  mass  was 
located. 

Results: 

There  was  a  statistically  significant  improvement  in  detection  performance  for 
masses  when  the  window  width  was  set  at  1024  with  a  level  of  3328. 

Conclusion: 

IW  should  be  tested  in  the  clinic  to  determine  whether  mass  detection 
performance  in  real  mammograms  is  improved. 


ll 


Background  and  Significance 


Effective  image  display  allows  for  an  improvement  in  the  clarity  of  structural 
details.  Mammography,  especially  in  patients  with  dense  breasts,  is  a  low  contrast 
examination  that  might  benefit  from  increased  contrast  between  malignant  tissue  and 
normal  dense  tissue.  Image  processing  may  allow  for  improved  visualization  of  details 
within  medical  images  [1].  Our  overall  aim  is  to  improve  the  accuracy  of  mammography 
with  image  processing  since  as  many  as  1 0%  of  palpable  breast  cancers  are  not 
visible  with  standard  mammographic  techniques[2]. 

Contrast  enhancement  methods  accentuate  or  emphasize  particular  objects  or 
structures  in  an  image  by  manipulating  the  gray  levels  in  the  display.  This  is  done  by 
imposing  a  predetermined  transformation  that  amplifies  the  contrast  between  structures 
and  effectively  “resamples”  the  recorded  intensities  to  enhance  the  properties  of  the 
displayed  image  [3].  These  methods  are  not  designed  to  increase  or  supplement  the 
inherent  structural  information  in  the  image,  but  simply  improve  the  contrast  and 
theoretically  enhance  particular  characteristics  [4].  Intensity  Windowing  (IW)  is  an 
image  processing  technique  that  involves  the  determination  of  new  pixel  intensities  by  a 
linear  transformation  which  maps  a  selected  band  of  pixel  values  onto  the  available 
gray  level  range  of  the  display  system  [4]. 

Many  investigators  have  studied  the  application  of  digital  image  processing 
techniques  to  mammography.  McSweeney  tried  to  enhance  the  visibility  of 
calcifications  by  using  edge  detection  for  small  objects,  but  never  reported  any  clinical 
results  [5].  Smathers  showed  that  intensity  band-filtering  could  increase  the  visibility  of 
small  objects  compared  to  images  without  such  filtering  [6].  Chan  used  unsharp 
masking  (an  edge-sharpening  technique  used  in  photography  for  many  years)  to 
remove  image  noise  for  computerized  detection  of  calcification  clusters  [7].  Chan  noted 
that  while  these  techniques  improved  detection,  the  improvements  may  have  been 
greater  if  the  observers  had  been  trained  to  make  diagnoses  from  the  processed 
mammograms  rather  than  the  unprocessed  (normal)  mammograms  [8].  Hale  et  al. 
have  applied  non-specific  contrast  and  brightness  adjustment  through  Adobe 
Photoshop®  to  digitized  mammograms  and  have  found  improved  performance  by 
radiologists  in  determining  the  likelihood  of  malignancy  of  mammographically  apparent 
lesions  [9].  Yin  et  al.  showed  that  nonlinear  bilateral  subtraction  is  useful  in  the 
computer-detection  of  mammographic  masses  [10,11]. 

Previous  work  at  UNC  has  explored  the  use  of  Intensity  Windowing  (IW)  and  the 
Adaptive  Histogram  Equalization  (AHE)  family  of  algorithms  in  mammography  and 
computed  tomography  [12-14].  We  have  previously  described  a  laboratory-based 
method  for  testing  the  efficacy  of  an  image  processing  algorithm  in  improving  the 
detection  of  masses  in  dense  mammographic  backgrounds  [15].  With  that  method, 
upon  which  our  current  work  is  based,  radiologists  and  non-radiologists  exhibit  similar 
trends  in  detection  performance.  While  non-radiologists  did  not  perform  as  well  as 
radiologists  overall,  the  two  populations  displayed  parallel  increases  and  decreases  in 
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performance  due  to  image  processing. 

The  experiments  described  in  this  paper  were  performed  to  determine  whether  IW 
could  improve  the  detection  of  simulated  masses  in  dense  mammograms  in  a 
laboratory  setting.  While  the  scope  of  this  paper  is  limited  to  the  evaluation  of  observer 
performance  using  our  established  experimental  paradigm,  it  may  be  interesting  for 
follow-up  work  to  evaluate  these  results  with  respect  to  measures  proposed  by  other 
authors,  such  as  the  conspicuity  measure  proposed  by  Revesz  and  Kundel.  (16-18) 

Materials  and  Methods 

The  experimental  paradigm  reported  here  is  based  on  the  model  we  have 
previously  described  and  allows  for  the  laboratory  testing  of  a  range  of  parameter 
values  (in  this  case,  window  width  and  level)  [1 5].  The  experimental  subject  is  shown  a 
series  of  test  images  that  consist  of  an  area  of  a  dense  mammogram  with  a  simulated 
mass  embedded  in  the  image  in  one  of  its  four  quadrants.  The  observer's  task  is  to 
determine  in  which  quadrant  the  mass  is  located.  The  test  images  are  displayed  in 
both  the  processed  and  unprocessed  format,  and  the  contrast  of  the  object  is  varied 
from  quite  easy  to  detect  to  impossible  to  detect. 

A  computer  program  randomly  selected  one  of  40  background  images  and  rotated 
that  background  to  one  of  four  orientations.  The  40  background  images  of  256x256 
pixels  each  were  extracted  from  actual  clinical  film  screen  mammograms  digitized  using 
a  Lumisys  digitizer  (Lumisys,  Inc.  Sunnyvale,  CA)  with  a  50  micron  sample  size  with  12 
bits  (4096  values)  of  density  data  per  sample.  The  images  contained  relatively  dense 
breast  parenchyma.  These  were  determined  to  be  dense  by  a  radiologist  expert  in 
breast  imaging  (EDP).  Only  areas  that  contained  relatively  uniformly  dense  tissue  were 
included,  with  adjacent  fatty  areas  specifically  excluded.  These  areas  were  selected 
because  they  are  most  likely  to  hide  soft  tissue  masses  in  the  clinical  setting.  They 
were  known  to  be  normal  by  virtue  of  at  least  three  years  of  normal  clinical  and 
mammographic  follow-up.  They  were  selected  by  a  breast  imaging  radiologist  from 
digitized  film  screen  craniocaudal  or  mediolateral  oblique  mammograms.  Figure  1 
shows  one  of  the  backgrounds.  The  density  of  this  background  as  displayed  in  this 
figure  is  typical  of  those  used  in  the  experiments. 

These  40  images  and  four  orientations  provided  160  different  dense  backgrounds. 

Next,  the  program  added  a  phantom  feature  (a  mass)  into  the  background.  The  image 
was  processed  with  IW  to  yield  the  final  stimulus. 

Mammographic  masses  were  simulated  by  blurring  (via  convolution  with  a 
Gaussian  kernel  with  a  standard  deviation  of  2.0  pixels)  a  disk  that  is  approximately 
5mm  in  diameter  when  printed  on  film  (1 .51  degree  visual  angle  at  a  38  cm  viewing 
distance).  The  masses  were  added  at  four  fixed  contrasts.  The  four  contrasts  added 
were,  in  digitized  density  units,  20,  40,  80,  and  160  digital  driving  levels  (DDLs).  While 
contrast  is  commonly  defined  as  a  change  in  luminance  with  respect  to  the  background 
luminance,  we  used  only  the  change  in  luminance  in  this  experiment  because  the 
change  was  independent  of  the  background  luminance.  This  is  because  contrast  was 
represented  in  log  luminance  (i.e.  the  DDLs  corresponded  to  optical  density),  and  since 
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all  the  study  backgrounds  were  in  the  luminance  range  where  Weber’s  law  holds, 
adding  a  mass  of  constant  density  equates  to  a  constant  change  in  contrast, 
independent  of  the  background  luminance.  DDL’s  do  not  correspond  directly  to  Just- 
Noticeable-Differences  (JND’s).  In  fact,  they  correspond  to  fractions  of  JND’s  for  the 
case  of  the  display  system  used  in  these  experiments. 

Although  the  simulated  structures  were  not  entirely  realistic,  they  did,  however, 
possess  the  same  scale  and  spatial  characteristics  of  actual  masses  typically  found  at 
mammography.  Figure  2  shows  an  example  of  a  simulated  mass.  Figure  3  shows  a 
typical  background  image  with  the  mass  added  to  it.  We  used  simulated  features 
instead  of  real  features  so  that  we  could  have  precise  control  over  the  location, 
orientation,  and  figure  to  background  contrast  of  the  masses. 

A  three  by  three  (3x3)  grid  of  window  and  level  parameters  was  designed  based 
on  the  results  of  pilot  preference  studies  done  with  two  radiologists  who  specialize  in 
breast  imaging.  In  these  pilot  studies,  the  two  radiologists  reviewed  dense 
mammograms  with  real  clinical  lesions  that  were  judged  to  be  difficult  to  visualize  using 
standard  film  screen  mammography.  There  were  7  images  of  this  type  reviewed  with 
70  combinations  of  window  width  and  level  applied.  The  radiologists  scored  each 
combination  of  values  as  showing  no  change  over  the  standard  image,  improving  the 
visibility  of  the  lesion,  or  worsening  its  visibility. 

For  experiment  1,  the  grid  spanned  all  the  likely  optimal  settings  (windows  of  512, 
768,  1024  and  levels  of  3072,  3328,  3584).  Thus,  there  were  a  total  of  10  IW  settings 
(including  the  default  unprocessed  image,  with  Window  width=  4096,  Level  =  2048)  that 
were  applied  throughout  experiment  1. 

To  confirm  the  results  of  the  first  experiment  and  to  examine  additional  IW 
settings,  experiment  2  was  performed.  Experiment  2  also  included  the  unprocessed 
(wide  open  window  width)  condition  and  9  other  IW  conditions.  The  combinations  of 
parameters  evaluated  in  Experiment  2  were  as  follows:  window  width  of  640  with  levels 
of  3456,  3584  and  3840;  window  width  of  1024  with  levels  of  3200,  3328  and  3584; 
and  window  width  of  1536  with  levels  of  2944,  3072,  and  3328). 

The  digital  images  were  printed  onto  standard  14X17  inch  single  emulsion  film  (3M 
HNC  Laser  Film,  3m,  St.  Paul,  MN)  using  a  Lumisys  Lumicam  film  printer  (Lumisys  Inc, 
Sunnyvale,  CA).  Each  original  50  micron  pixel  was  printed  at  a  spot  size  of  160 
microns,  which  produced  film  images  4X4  centimeters,  resulting  in  an  enlargement  by  a 
factor  of  3.2.  The  background  and  target  are  magnified  together.  The  radiologist 
observers  in  the  pilot  experiment  reported  that  the  magnification  did  not  make  the 
backgrounds  unrealistic.  Forty  images  were  printed  per  sheet  of  film.  The  images  were 
randomly  ordered  into  an  8X5  grid  on  each  sheet  of  film.  Both  the  film  digitizer  and  film 
printer  were  calibrated,  and  measurements  of  the  relationship  between  optical  density 
on  film  and  digital  units  on  the  computer  were  determined  in  order  to  generate  transfer 
functions  describing  the  digitizer  and  film  printer.  In  order  to  maintain  a  linear 
relationship  between  the  optical  densities  on  the  original  analogue  film  and  the  digitally 
printed  film,  we  calculated  a  standardization  function  that  provided  a  linear  matching 
between  the  digital  and  printer  transfer  functions.  This  standardization  function  was 
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applied  when  printing  the  films  to  maintain  consistency  between  the  original  optical 
densities  of  the  original  mammography  film  and  those  reproduced  on  the  digitally 
printed  films.  The  film  printer  produces  films  with  a  constant  relationship  between  an 
optical  density  range  of  3.35  OD  to  0.13  OD,  corresponding  to  a  digital  input  range  of  0 
to  4095,  respectively. 

There  were  20  observers  for  each  experiment.  These  were  graduate  students 
from  the  medical  school,  biomedical  engineering  department,  and  computer  science 
department.  Performance  bonus  pay  was  provided.  Observers  selected  the  quadrant 
of  the  image  that  they  thought  contained  the  mass.  All  images  contained  a  mass. 
Observers  were  told  to  make  their  best  guess  if  they  could  not  see  the  simulated  mass 
with  certainty. 

Films  were  displayed  in  a  darkened  room  on  a  standard  mammography  lightbox 
that  was  masked  so  that  only  the  grid  of  images  on  the  film  was  illuminated.  Observers 
could  move  closer  to  the  image  and  could  use  a  standard  mammography  magnifying 
glass,  as  desired.  The  observers  were  trained  for  the  task  through  the  use  of  two  sets 
of  stimulus  image  films  with  instructive  feedback  before  actually  starting  the  experiment. 

Both  experiments  had  the  same  basic  design.  The  order  of  the  presentation  of  the 
stimuli  was  counterbalanced  so  as  to  eliminate  any  systematic  effect  of  non-important 
variables.  All  160  possible  combinations  of  processing  condition  (10  IW  levels), 
contrast  level  (4  contrasts)  and  location  of  the  masses  (4  quadrants)  were  used  in  the 
experiment.  The  experiment  was  designed  to  have  5  self-contained  blocks,  in  which  all 
160  combinations  appeared.  The  intent  was  to  have  the  observer  see  all  the 
combinations  in  each  block,  in  case  the  observer  was  unable  to  complete  the 
experiment.  In  fact,  all  observers  did  complete  the  experiment.  There  were  40 
backgrounds  and  4  possible  rotations  of  each  background,  for  160  possible  background 
patterns.  For  each  block,  a  different  background  pattern  was  assigned  uniquely  to  each 
of  the  160  possible  combinations.  The  assignment  was  different  for  each  block.  Each 
observer  looked  at  a  total  of  800  images,  which  were  the  160  possible  combinations, 
each  superimposed  on  5  backgrounds. 

Observers  were  instructed  to  take  breaks  after  each  block  of  stimuli,  and  more 
often  if  necessary.  No  time  limit  was  imposed  on  the  observers  viewing  duration  of  the 
test  images.  Overall,  the  experiment  took  2  hours  for  each  observer,  divided  into  two 
sessions  of  approximately  60  minutes  each.  The  two  sessions  were  always  scheduled 
on  two  different  days  within  a  week  of  each  other. 


Data  Analysis  Overview 

Classical  sensory  discrimination  theory  predicts  that  since  contrast  values  were 
varied  from  virtually  imperceptible  to  highly  apparent,  a  typical  S-shaped  curve  will 
describe  the  data[2].  At  values  where  the  contrast  was  very  low,  observers  will  on 
average  guess  randomly  and  get  approximately  25%  right  since  there  are  four  choices. 
Where  the  contrast  is  very  high,  they  will  almost  always  get  the  correct  answer.  This 
relationship  between  loglO  of  the  contrast  of  the  object  relative  to  the  background 
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intensity  and  the  percent  correct  can  be  described  with  a  probit  model.  This  model  is 
typically  used  to  describe  the  relationship  between  a  continuous  predictor  (log  contrast) 
and  a  discrete  variable  (percent  correct),  and  assumes  that  the  curve  between  them  is 
described  by  the  cumulative  Gaussian  distribution. 


Probit  models  were  fit  for  each  subject  and  enhancement  condition  using  contrast 
(DDLs  of  mass  above  background)  as  the  predictor.  The  probability  that  a  subject  gets 

Pr{correct}  =  1/4  +  (1  - 1/4)  ®  [(x-pij)/ai  ] . 

a  correct  answer  is  given  by  the  following  equation: 

Here  i  indexes  subjects,  and  j  indexes  enhancements  with  x  representing  the 
log  (contrast).  Classical  psychophysical  theory  and  experimental  results  strongly 
support  the  use  of  the  logarithmic  transform,  as  did  our  data.  In  the  experiments 
reported  here,  we  used  x=log10(#  DDL’s  above  background).  The  subscripts  in  the 
equation  indicate  that  for  each  subject  a  single  spread  parameter  was  estimated  (which 
pools  across  all  stimuli  and  conditions).  Also,  for  each  subject,  a  separate  location 
parameter  was  estimated  for  each  enhancement  condition.  With  10  processing 
conditions,  this  implies  a  total  of  10  location  parameter  estimates  and  one  spread 
parameter  for  each  subject.  Our  assumption  is  that  there  is  a  common  spread 
parameter  makes  sense  biologically,  since  it  corresponds  to  linearity  of  the  perceptual 
mapping.  It  is  advantageous  to  an  organism  to  have  the  same  amount  of  change  in 
stimulus  produce  a  constant  perceptual  response,  and  that  is  precisely  how  the  human 
visual  system  works  over  a  wide  range. 

The  location  parameter,  p,  is  the  mean  of  the  corresponding  Gaussian  distribution 
and  the  inflection  point  of  the  sigmoidal  probit  curve.  Processing  conditions  that 
improve  detection  will  cause  this  parameter  to  be  smaller,  and  the  curve  will  shift  to  the 
left,  or  equivalently  if  viewed  from  the  perspective  of  the  same  contrast  value,  the  curve 
shifts  upward.  This  occurs  because  lower  contrast  levels  are  required  to  spot  the 
object.  When  the  processing  of  the  image  makes  detection  harder,  higher  contrast 
levels  are  needed  to  locate  the  mass,  and  the  curve  shifts  to  the  right.  The  values  of  a, 
the  spread  parameter,  correspond  to  the  slope  of  the  line.  Large  values  of  a 
correspond  to  steep  slopes. 

The  probit  analysis  summarized  the  relationship  between  contrast  and  proportion 
correct  for  each  subject  and  processing  condition.  To  compare  the  processing 
conditions  and  to  examine  the  effect  of  window  width  and  level,  further  analysis  was 
needed.  To  include  both  the  mean  and  the  location  parameter  from  the  probit  analysis, 
we  defined  an  overall  measure  to  be  0,-,=^  +  a,-,  which  corresponds  to  88%  correct. 
Because  we  were  interested  in  the  improvement  offered  by  IW,  we  measured  the 
"success"  of  a  processing  condition  by  calculating  the  difference  between  its  0  score 
and  the  0  score  for  the  unprocessed  image  for  each  subject.  A  large  positive 
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difference-of-0  score  reflects  improved  performance,  because  it  indicates  better 
detection  with  processed  images  than  with  unprocessed  images. 

For  each  experiment,  two  analyses  were  performed  using  this  outcome  measure. 
To  keep  an  overall  experiment-wide  type  1  error  rate  of  .05,  a  repeated  measures 
Analysis  of  Variance  was  done  at  the  .04  level,  with  a  set  of  nine  T-tests  at  the  .01/9 
level. 


Repeated  measures  analysis  of  variance  (ANOVA)  is  a  technique  used  to  analyze 
data  in  which  many  measurements  were  made  on  each  subject.  It  allows  one  to 
examine  the  effect  of  processing  conditions  and  their  interactions,  while  allowing  for  the 
dependence  of  measurements  taken  on  the  same  observers.  With  the  difference  in  0 
scores  as  the  outcome,  and  window  width  and  level  as  the  predictors,  the  repeated 
measures  ANOVA  model  was  fitted. 

The  model  can  be  thought  of  as  a  response  surface  in  three  dimensions  with 
performance  plotted  against  window  width  and  level.  A  flat  surface  would  mean  that 
window  width  and  level  had  no  effect  on  the  outcome.  The  major  hypothesis  tested  in 
the  ANOVA  is  equivalent  to  asking  the  question  "Is  the  response  surface  flat?".  If  it  is 
not  flat,  the  step-down  hypotheses  allow  one  to  ask  what  shape  the  surface  is,  whether 
it  is  curved  in  both  directions  (quadratic  by  quadratic  trends),  curved  in  one  direction 
and  sloped  in  the  other  (quadratic  by  linear  trends),  or  sloped  in  both  directions  (linear 
by  linear  trends).  A  peak  in  the  surface  means  that  there  is  one  image  processing 
technique  that  is  better  than  any  other.  Conversely,  if  the  difference  score  is  equal  to 
zero  for  any  intensity  windowing  setting,  it  would  correspond  to  no  difference  between 
the  processed  image  and  the  unprocessed  image.  That  is  what  the  T  statistics  test. 

Results:  Experiment  1 

The  repeated  measures  ANOVA  revealed  that  there  was  a  significant  interaction 
between  window  width  and  level  (p=.0001 ,  G-Ge  =  .8347).  To  examine  the  nature  of 
this  interaction,  a  series  of  step-down  tests  was  planned.  There  was  a  significant 
interaction  between  a  quadratic  trend  in  window  width  and  a  quadratic  trend  in  level 
(F=31.08,  p=.0001).  Because  the  quadratic  by  quadratic  interaction  was  significant,  no 
further  tests  were  examined.  A  quadratic  by  quadratic  trend  means  that  the  surface 
was  curved  with  respect  to  both  window  width  and  level,  and  that  the  shape  of  the 
curve  differed  for  fixed  levels  of  window  width  and  level.  (Figures  4  and  5). 


At  the  overall  .01  level,  the  differences  between  the  enhancement  conditions  and 
the  unenhanced  were  examined.  The  null  hypothesis  is  that  there  will  be  no  difference 
between  the  mean  0  for  the  unenhanced  and  an  enhancement  condition.  There  are 
nine  such  hypotheses,  corresponding  to  the  nine  enhancements.  A  Bonferroni 
correction  to  control  the  overall  error  rate  made  each  individual  alpha  level  .001 1 .  Four 
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settings  of  intensity  windowing  made  finding  the  masses  significantly  harder,  three 
made  the  task  significantly  easier  and  two  made  no  significant  difference.  The  settings 
that  made  the  task  easier  are  window  width  1024  with  level  3328,  window  width  768 
with  level  3584  and  window  width  1024  with  level  of  3584.  (Table  1) 

Results:  Experiment  2 

Again,  the  repeated  measures  ANOVA  showed  that  there  was  significant 
interaction  between  window  width  and  level  (p<0.0001,  F=60.9).  (Figures  6  and  7)  As 
in  experiment  1,  a  quadratic  by  quadratic  interaction  was  significant  (p<0.0001, 
F=32.61).  Table  2  shows  the  results  of  nine  two-sided  t-tests.  Only  one  image 
processing  setting  resulted  in  significantly  better  performance  than  the  unprocessed, 
namely  window  width  of  1024  with  a  window  level  of  3328  (p<0.0001).  Seven  of  the 
settings  were  not  significantly  different  from  the  unprocessed  image.  One  setting  was 
significantly  worse.  (Table  2) 

The  probit  model  predicts  that  IW  will  increase  detection  of  masses.  For  example, 
at  the  contrast  level  of  40  DDL’s  above  background,  which  is  the  contrast  level  tested 
that  was  nearest  to  the  observer’s  detection  threshold,  these  results  predict  that  the 
feature  detection  rate  would  change  from  51%  to  68%  for  the  conditions  of  experiment 
1 ,  and  from  52%  to  67%  for  the  conditions  of  experiment  2.  (Figures  5  and  7). 

Discussion 


These  results  are  encouraging.  This  is  the  first  experiment  in  mammography  that 
demonstrates  that  an  algorithm  can  improve  the  detection  of  a  simulated  mass  placed 
in  a  dense  mammogram.  At  the  same  time,  it  is  obviously  important  to  choose  the 
window  width  and  level  with  care  since  performance  can  be  significantly  degraded  if 
inappropriate  parameters  are  chosen. 

What  do  these  results  mean  for  clinical  mammographers?  Will  we  be  using  this 
technology  in  the  clinic  in  detecting  lesions  in  dense  mammograms?  The  use  of 
graduate  student  observers  and  the  use  of  simulated  masses  in  this  study  might 
incorrectly  predict  the  performance  of  radiologists  in  detecting  real  masses  in  real 
patients.  We  have  demonstrated  previously  that  graduate  student  performance  at  this 
task  parallels  the  performance  of  experienced  mammographers  [15].  Evaluation  by 
radiologists  on  real  patients  will  determine  the  ultimate  utility  of  this  algorithm  in  the 
clinical  setting.  Because  we  have  used  real  clinical  images  and  we  have  simulated 
masses  using  relatively  realistic  stimuli,  we  are  optimistic  that  these  methods  will 
improve  clinical  performance  and  that  radiologists  will  be  using  IW  to  help  them  in 
determining  whether  mammograms  of  women  with  dense  breasts  really  do  contain 
masses. 

One  could  argue  that  are  methods  are  limited  since  the  small  areas  studies  make 
IW  more  useful  than  it  would  be  in  larger  areas.  By  magnifying  the  original  12.8  mm  X 
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12.8  mm  image  to  40  mm  X  40  mm  during  the  printing  process,  the  variation  in  density 
may  be  reduced  compared  to  the  variation  of  an  actual  40  mm  X  40  mm  cropped 
section  of  a  mammogram,  since  a  third  fewer  samples  are  included.  In  a  similar 
experiment[23],  we  found  that  the  variation  difference  between  cropped 
mammographic  sections  of  different  sizes  from  uniformly  dense  areas  of  mammograms 
was  small,  and  unlikely  to  have  a  significant  effect  on  feature  detection  of  masses  when 
using  this  experimental  paradigm. 

In  addition,  ideally  one  would  report  on  the  standard  deviations  of  the  a  of  the  pixel 
values  of  the  background  as  a  parameter  affecting  the  probability  of  detection  of  the 
mass  embedded  in  the  background.  While  we  report  this  data  in  all  other  experiments 
using  this  paradigm,  unfortunately,  we  are  unable  to  do  so  due  to  an  omission  in  this 
experiment. 

Digital  mammography  will  be  available  in  the  clinic  very  soon.  It  is  obvious  that 
image  processing  will  be  used  to  optimize  the  visibility  of  lesions  in  digital 
mammograms  [20].  Ideally,  any  image  processing  algorithm  that  might  be  useful  will  be 
tested  on  real  patients  in  that  setting.  That  will  be  an  expensive  and  time  consuming 
process  that  will  involve  real  patients  making  clinicallly  important  decisions  about  their 
own  breast  health,  including  the  adviseability  of  biopsy,  lumpectomy  and  mastectomy. 
Ideally,  before  this  technology  arrives  in  the  clinic,  radiologists  will  have  some  idea  of 
which  category  of  algorithms  to  test  in  that  setting.  This  work  is  intended  to  give 
radiologists  preliminary  data  to  narrow  the  choices  that  might  be  useful  before  the 
expensive  clinical  tests  are  undertaken.  This  approach  suggests  not  only  which 
algorithms  might  help  clinically  but  which  parameter  settings  most  improve  detection. 

One  could  take  the  approach  that  the  IW  dials  should  be  spun  until  a  clinically 
pleasing  image  is  displayed.  This  approach  might  be  acceptable  and  even  convincing 
to  many  radiologists.  It  is  at  least  possible  that  what  pleases  radiologists  in  terms  of 
the  aesthetics  of  the  image  might  not  improve  the  detection  performance  of  their  visual 
systems,  and  in  fact,  could  worsen  their  detection  performance.  This  project  was 
intended  to  be  more  rigorous  in  exploring  the  window  widths  and  levels  that  might  be 
useful  in  the  most  challenging  areas  of  the  breast,  namely  the  dense  parts.  We  have 
performed  similar  experiments  on  the  AHE  class  of  algorithms  also  [21,22], 

This  experiment  does  not  address  how  IW  would  effect  the  appearance  of  fatty 
areas  of  the  breast,  and  the  detectibility  of  lesions  in  those  parts.  We  would  not  want  to 
apply  an  algorithm  that  degrades  performance  in  areas  of  the  breast  where  sensitivity  is 
quite  high  with  current  technology.  There  are  two  possible  technical  responses  to  that 
concern.  First,  IW  could  be  applied  selectively  to  only  the  dense  areas  as  an  adjunct  to 
the  more  standard  appearing  mammogram  with  the  radiologist  pointing  and  clicking  to 
the  areas  where  windowing  would  be  desireable.  Alternatively,  the  IW  could  be 
individualized  to  the  patient’s  unique  intensity  histogram  so  that  the  areas  to  be 
processed  of  the  image  could  be  selected  by  the  computer  itself.  In  fact,  ideally  the 
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computer  could  be  programmed  to  choose  an  individual  IW  setting  for  each  portion  of 
the  mammogram  so  that  contrast  was  preserved  in  all  portions  of  the  image.  Ongoing 
experiments  in  our  laboratory  are  currently  exploring  the  latter  possibility. 

Of  course,  our  results  to  date  cannot  estimate  the  exact  frequency  of  false  positive 
diagnoses  when  intensity  windowing  is  used.  Many  alternate  forced  choice  tests  (in  our 
case,  4-AFC)  yield  proportion  correct  as  the  primary  outcome.  MacMillan  and 
Creelman  discussed  methods  for  converting  proportion  correct  in  this  setting  to  a  value 
of  d',  the  sensitivity  parameter  of  an  ROC  analysis  [23].  The  particular  choice  of 
conversion  depends  on  side  conditions  concerning  the  nature  of  any  rater  basis.  Given 
the  characteristics  of  the  study  design,  subjects  and  training,  we  believe  that  superior 
proportion  correct  will  translate  into  superior  d'.  If  this  is  true,  the  practical  value  of 
intensity  windowing  must  be  tested  in  a  clinical  setting.  Then  ROC  analysis  will  allow 
separate  analysis  of  a  reader’s  sensitivity  and  pay  off  function  on  the  performance  of 
the  technique  as  part  of  a  diagnostic  system. 

The  testing  of  these  methods  on  patients  with  palpable  and  mammographically 
detected  lesions  has  been  funded  by  the  National  Cancer  Institute  and  the  Department 
of  Defense,  and  will  be  ongoing  over  the  next  few  years  at  UNC  and  Thomas  Jefferson 
University  Hospital.  We  expect  to  evaluate  both  Intensity  Windowing  and  Contrast 
Limited  Adaptive  Histogram  Equalization  (CLAHE)  in  the  clinical  setting  to  determine 
whether  or  not  these  algorithms  improve  the  performance  of  radiologists  in  detecting 
and  characterizing  breast  lesions. 
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CAPTIONS: 

Figure  1:  An  example  of  a  dense  normal  background  taken  from  a  patient’s 
mammogram  and  used  in  the  reported  experiments. 

Figure  2:  An  example  of  a  simulated  mass.  The  actual  size  of  the  masses  used 
in  the  experiments  was  only  5  mm. 

Figures  3  a  &  b:  A  dense  background  with  a  simulated  mass  embedded  in  it  in 
the  right  upper  quadrant  (arrow).  Figure  3a  is  the  default  unprocessed 
image  with  window  width  4096  and  level  2048.  Figure  3b  is  the  same 
image  with  window  width  1024  and  level  3328. 

Figure  4:  Interpolated  predicted  values  from  repeated  measures  ANOVA  for 
Study  1 :  difference  in  0  value  versus  window  width  and  window  level. 

Figure  5:  Estimated  detection  probability  from  Study  1  for  window  width  of  1024 
and  window  level  of  3328  versus  unprocessed  condition.  The  shift  in  the 
curve  to  the  left  reflects  improved  detection. 

Figure  6:  Interpolated  predicted  values  from  repeated  measures  ANOVA  for 
Study  2:  difference  in  0  value  versus  window  width  and  window  level. 

Figure  7:  Estimated  detection  probability  from  Study  2  for  window  width  of  1024 
and  window  level  of  3328  versus  unprocessed  condition.  The  shift  in  the 
curve  to  the  left  reflects  improved  detection. 

Table  1 :  Summary  of  differences  between  unenhanced  and  enhanced  theta  for 
Study  1 . 

Table  2:  Summary  of  differences  between  unenhanced  and  enhanced  theta  for 
Study  2. 
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Table  1:  Summary  of  differences  between  unenhanced  and  enhanced  0 
for  Study  1.  Positive  values  in  mean  difference  in  0  column  correspond  to 
improved  detection  of  simluated  masses. 


Window 

Level 

Window 

Width 

Mean  Diff 
in  0 

Std  Dev 

p-value 

3072 

512 

-.50 

.108 

.0001 

3072 

768 

-.32 

.093 

.0001 

3072 

1024 

00 

1* 

.089 

.0001 

3328 

512 

-.11 

.074 

.0001 

3328 

768 

.04 

.087 

.0706 

3328 

1024 

.18 

.104 

.0001 

3584 

512 

-.03 

.097 

.1716 

3584 

768 

.14 

.082 

.0001 

3584 

1024 

.12 

.121 

.0004 
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Table  2:  Summary  of  differences  between  unenhanced  and  enhanced  0 
for  Study  2.  Positive  values  in  mean  difference  in  0  column  correspond  to 
improved  detection  of  simluated  masses. 


Window 

Level 

Window 

Width 

Mean  Diff. 
In  0 

Std  Dev 

p-value 

3456 

640 

0.04 

0.08 

0.0239 

3584 

640 

-0.05 

0.09 

0.0215 

3840 

640 

-0.31 

0.09 

0.0001 

3200 

1024 

0.04 

0.07 

0.0142 

3328 

1024 

0.14 

0.08 

0.0001 

3584 

1024 

0.01 

0.09 

0.6155 

2944 

1536 

-0.02 

0.07 

0.1255 

3072 

1536 

0.06 

0.08 

0.0045 

3328 

1536 

0.06 

0.07 

0.0013 
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Figure  4:  Study  1 
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Figure  6,  Study  2 
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